In this guide you will acquire the skills needed to process and present spatial data in R. The objectives of the guide are as follows
This lab guide follows closely and supplements the material presented in Chapters 1, 2.1, 2.2, 8 and 9 in the textbook Geocomputation with R (GWR) and class Handout 5.
Assignment 5 is due by 2:00 pm, February 15th on Canvas. See here for assignment guidelines. You must submit an .Rmd file and its associated .html file. Name the files: yourLastName_firstInitial_asgn05. For example: brazil_n_asgn05.
Download the Lab template into an appropriate folder on your hard drive (preferably, a folder named ‘Lab 5’), open it in R Studio, and type and run your code there. The template is also located on Canvas under Files. Change the title (“Lab 5”) and insert your name and date. Don’t change anything else inside the YAML (the stuff at the top in between the ---). Also keep the grey chunk after the YAML. For a rundown on the use of R Markdown, see the assignment guidelines
You’ll need to install the following packages in R. You only need to do this once, so if you’ve already installed these packages, skip the code. Also, don’t put these install.packages() commands in your R Markdown document. Copy and paste the code in the R Console. We’ll talk about what functions these packages provide as they come up in the guide.
install.packages("sf")
install.packages("tigris")
install.packages("tmap")
install.packages("RColorBrewer")
You’ll need to load the following packages using library(). Unlike installing, you will always need to load packages whenever you start a new R session.
library(tidyverse)
library(tidycensus)
library(flextable)
library(sf)
library(tigris)
library(tmap)
library(RColorBrewer)
The main package we will use for handling spatial data in R is the tidy friendly sf package. sf stands for simple features. What is a feature? A feature is thought of as a thing, or an object in the real world, such as a building or a tree. A county can be a feature. As can a city and a neighborhood. Features have a geometry describing where on Earth the features are located, and they have attributes, which describe other properties. Think back to Lab 3 - we were working with counties. The difference between what we were doing then and what we will be doing in this lab is that counties in Lab 3 had attributes (e.g. percent Hispanic, total population), but they did not have geometries. As such, we could not put them on a map because we didn’t have their specific geographic coordinates. This is what separates nonspatial and spatial data in R.
sf is the specific type of data object that deals with spatial information in R. Think back to Lab 1 when we discussed the various ways R stores data - sf is just another way. But please note that spatial data themselves outside of R can take on many different formats. We’ll be primarily working with shapefiles in this class. Shapefiles are not the only type of spatial data, but they are the most commonly used. Let’s be clear here: sf objects are R specific and shapefiles are a general format of spatial data. This is like tibbles are R specific and csv files are a general format of non spatial data.
We will be primarily working with census geographic data in this lab and pretty much all future labs. If you need a reminder of the Census geographies, go back to Handout 3. There are two major packages for bringing in Census shapefiles into R: tidycensus and tigris.
In Lab 3, we worked with the tidycensus package and the Census API to bring in Census data into R. Fortunately, we can use the same commands to bring in Census geographic data. First, load in your Census API key. If you already installed your API key in a past lab using install = TRUE in census_api_key(), skip this step.
census_api_key("YOUR API KEY GOES HERE", install = TRUE)
Then use the get_acs() command to bring in California tract-level median household income, total foreign-born population, and total population from the 5-year 2017-2021 American Community Survey (ACS). Remember that “E” at the end of the variable indicates “Estimate” and “M” indicates margin of errors.
ca.tracts <- get_acs(geography = "tract",
year = 2021,
variables = c(medincome = "B19013_001",
fb = "B05012_003", totp = "B05012_001"),
state = "CA",
survey = "acs5",
output = "wide",
geometry = TRUE)
The only difference between the code above and what we used in Lab 3 is we have one additional argument added to the get_acs() command: geometry = TRUE. This command tells R to bring in the spatial features associated with the geography you specified in the command, in our case California tracts. You can further narrow your geographic scope to the county level by typing in county = as an argument. For example, to get just Sacramento county tracts, you would type in county = "Sacramento". Type in ca.tracts to see what we’ve got.
ca.tracts
## Simple feature collection with 9129 features and 8 fields (with 20 geometries empty)
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.4096 ymin: 32.53444 xmax: -114.1312 ymax: 42.00948
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME medincomeE
## 1 06037294610 Census Tract 2946.10, Los Angeles County, California 63348
## 2 06059011402 Census Tract 114.02, Orange County, California 127847
## 3 06019001304 Census Tract 13.04, Fresno County, California 27143
## 4 06037273700 Census Tract 2737, Los Angeles County, California 116393
## 5 06037207501 Census Tract 2075.01, Los Angeles County, California 90871
## 6 06053010900 Census Tract 109, Monterey County, California NA
## 7 06059087403 Census Tract 874.03, Orange County, California 56203
## 8 06037242100 Census Tract 2421, Los Angeles County, California 23214
## 9 06037408134 Census Tract 4081.34, Los Angeles County, California 125129
## 10 06001451704 Census Tract 4517.04, Alameda County, California 145556
## medincomeM fbE fbM totpE totpM geometry
## 1 21488 1726 602 4434 871 MULTIPOLYGON (((-118.2633 3...
## 2 17181 550 211 3198 488 MULTIPOLYGON (((-117.9137 3...
## 3 4611 1602 368 5567 888 MULTIPOLYGON (((-119.7546 3...
## 4 22958 421 173 2934 357 MULTIPOLYGON (((-118.4592 3...
## 5 23167 746 182 2022 328 MULTIPOLYGON (((-118.2559 3...
## 6 NA 1760 263 8489 506 MULTIPOLYGON (((-121.3912 3...
## 7 4488 1186 157 3001 283 MULTIPOLYGON (((-117.9154 3...
## 8 6097 583 147 2648 447 MULTIPOLYGON (((-118.2367 3...
## 9 27906 1283 224 2833 400 MULTIPOLYGON (((-117.9046 3...
## 10 16888 701 244 4282 501 MULTIPOLYGON (((-121.7985 3...
The object looks much like a basic tibble, but with a few differences.
MULTIPOLYGON form (as opposed to points or lines, the other basic vector data forms, which were discussed in Handout 5).We can find out the object’s class
class(ca.tracts)
## [1] "sf" "data.frame"
Here we find that the object is an sf data frame.
Another package that allows us to bring in census geographic boundaries is tigris. Here is a list of all the geographies you can download through this package. Let’s bring in the boundaries for Sacramento city. Remember from Handout 3 that cities are designated as places by the Census. Use the places() function to get all places in California.
pl <- places(state = "CA", cb = TRUE, year=2021)
The cb = TRUE argument tells R to download a generalized cartographic boundary file, which drastically reduces the size of the data (compare the file size when you don’t include cb = TRUE). For example, it eliminates all areas that are strictly covered by water (e.g. lakes). The argument year=2021 tells R to bring in the boundaries for that year (census geographies can change from year to year). When using the multi-year ACS, best to use the end year of the period. In the get_acs() command above we used year=2021, so also use year=2021 in the places() command. Note that unlike the tidycensus package, tigris does not allow you to attach attribute data (e.g. Hispanic, total population, etc.) to geometric features.
Take a glimpse of pl
glimpse(pl)
## Rows: 1,611
## Columns: 13
## $ STATEFP <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",…
## $ PLACEFP <chr> "70098", "69088", "62602", "66000", "02000", "64434", "0330…
## $ PLACENS <chr> "02411827", "02411819", "02410986", "02411782", "02409704",…
## $ AFFGEOID <chr> "1600000US0670098", "1600000US0669088", "1600000US0662602",…
## $ GEOID <chr> "0670098", "0669088", "0662602", "0666000", "0602000", "066…
## $ NAME <chr> "Santa Rosa", "Santa Clarita", "Rolling Hills", "San Diego"…
## $ NAMELSAD <chr> "Santa Rosa city", "Santa Clarita city", "Rolling Hills cit…
## $ STUSPS <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA",…
## $ STATE_NAME <chr> "California", "California", "California", "California", "Ca…
## $ LSAD <chr> "25", "25", "25", "25", "25", "43", "25", "25", "25", "25",…
## $ ALAND <dbl> 110145874, 183250290, 7746141, 844031898, 130206498, 692881…
## $ AWATER <dbl> 451570, 160454, 0, 120532678, 1567025, 0, 0, 646717, 0, 0, …
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-122.7624 3..., MULTIPOLYGON (…
We see a geometry column, which indicates we have a spatial data frame.
We can use filter() to keep Sacramento city. We will filter on the variable NAME to keep Sacramento.
sac.city <- filter(pl, NAME == "Sacramento")
The argument NAME == "Sacramento" tells R to keep cities with the exact city name “Sacramento”. Make sure we got what we wanted.
glimpse(sac.city)
## Rows: 1
## Columns: 13
## $ STATEFP <chr> "06"
## $ PLACEFP <chr> "64000"
## $ PLACENS <chr> "02411751"
## $ AFFGEOID <chr> "1600000US0664000"
## $ GEOID <chr> "0664000"
## $ NAME <chr> "Sacramento"
## $ NAMELSAD <chr> "Sacramento city"
## $ STUSPS <chr> "CA"
## $ STATE_NAME <chr> "California"
## $ LSAD <chr> "25"
## $ ALAND <dbl> 255399993
## $ AWATER <dbl> 5413141
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-121.5601 3...
Let’s use use the function counties() to bring in county boundaries.
cnty <- counties(state = "CA", cb = TRUE, year=2021)
Take a look at the data to make sure it contains what we want.
glimpse(cnty)
## Rows: 58
## Columns: 13
## $ STATEFP <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",…
## $ COUNTYFP <chr> "059", "111", "063", "015", "023", "043", "037", "065", "10…
## $ COUNTYNS <chr> "00277294", "00277320", "00277296", "01682074", "01681908",…
## $ AFFGEOID <chr> "0500000US06059", "0500000US06111", "0500000US06063", "0500…
## $ GEOID <chr> "06059", "06111", "06063", "06015", "06023", "06043", "0603…
## $ NAME <chr> "Orange", "Ventura", "Plumas", "Del Norte", "Humboldt", "Ma…
## $ NAMELSAD <chr> "Orange County", "Ventura County", "Plumas County", "Del No…
## $ STUSPS <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA",…
## $ STATE_NAME <chr> "California", "California", "California", "California", "Ca…
## $ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",…
## $ ALAND <dbl> 2053476505, 4767622161, 6612400772, 2606118035, 9241565229,…
## $ AWATER <dbl> 406279630, 947345735, 156387638, 578742633, 1253726036, 362…
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-118.1144 3..., MULTIPOLYGON (…
To get Sacramento county, we use the filter() function. Similar to pl, we will filter on the variable NAME to keep Sacramento.
sac.county <- filter(cnty, NAME == "Sacramento")
Guess what? You earned another badge! Yipee!!

Directly reading spatial files using an API is great, but doesn’t exist for many spatial data sources. You’ll often have to download a spatial data set, save it onto your hard drive and read it into R. The function for reading spatial files from your hard drive as sf objects is st_read().
Let’s bring in two shapefiles I created that contains (1) median housing values for census tracts in Sacramento county and (2) Sacramento county parks. I zipped up the files and uploaded it onto Github. Make sure your current working directory is pointed to the appropriate folder on your hard drive (use setwd()). Use the following code to download and unzip the file.
download.file(url = "https://raw.githubusercontent.com/crd150/data/master/lab5files.zip", destfile = "lab5files.zip")
unzip(zipfile = "lab5files.zip")
Don’t worry if you don’t understand these commands - they are more for you to simply copy and paste so that you can download files that I zipped up and uploaded onto Github. You can look at the help documentation for each function if you are curious.
If you are having problems with the above code, I also uploaded the zip file on Canvas in the Week 5 Lab folder. To manually unzip zipped files on a Mac, check here. For Windows, check here.
You should see SacramentoCountyTracts, Parks and californiatractsrace files in your current working directory (type in getwd() to find where these files reside on your hard drive). Note that the shapefile is actually not a single file but is represented by multiple files. For SacramentoCountyTracts, you should see four files named SacramentoCountyTracts with shp, dbf, prj, and shx extensions. These files are all connected to one another, so don’t manually alter these files. Only open and alter these files in R. Moreover, if you want to remove a shapefile from your hard drive, delete all the associated files not just one. For Parks, you will see five associated files. We will bring in the non-spatial data file californiatractsrace.csv a little later.
Make sure your working directory is now pointed to the lab5files folder. Then bring in the Sacramento County tract shapefile using the function st_read(), which is from the sf package. You’ll need to add the .shp extension so that the function knows it’s reading in a shapefile.
sac.county.tracts <- st_read("SacramentoCountyTracts.shp", stringsAsFactors = FALSE)
The argument stringsAsFactors = FALSE tells R to keep any variables that look like a character as a character and not a factor, which we won’t use much, if at all, in this class.
Take a look
glimpse(sac.county.tracts )
## Rows: 363
## Columns: 4
## $ GEOID <chr> "06067008119", "06067006102", "06067009651", "06067009644", "…
## $ NAME <chr> "Census Tract 81.19, Sacramento County, California", "Census …
## $ medhval <dbl> 346100, 396500, 477800, 455600, 289600, 290100, 276900, 41600…
## $ geometry <POLYGON [US_survey_foot]> POLYGON ((6758527 2009543, ..., POLYGON …
Has a geometry column, so we know it is spatial. Bring in the parks file.
parks <- st_read("Parks.shp", stringsAsFactors = FALSE)
Take a look
glimpse(parks)
## Rows: 741
## Columns: 43
## $ OBJECTID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ PARK_ID <int> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010,…
## $ LANDMARK <chr> "ANCIL HOFFMAN GOLF COURSE", "HOGBACK ISLAND FISHING ACCESS…
## $ DESCRIPTIO <chr> "GOLF COURSE", "BOAT LAUNCH", "PARKWAY", "PARK", "PARK", "P…
## $ DISTRICT <chr> NA, "SAC COUNTY", "SAC COUNTY", "COSUMNES CSD", "CITY OF SA…
## $ PARK <chr> "Ancil Hoffman Golf Course", "Hogback Island Fishing Access…
## $ PARK_ADDRE <chr> "6700 TARSHES DR", "1500 GRAND ISLAND RD", "7929 LA RIVIERA…
## $ ZIP_CODE <chr> "95608", "95690", "95864", "95757", "95833", "95670", "9582…
## $ AMPHITHEAT <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PUBLIC_POO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LAP_SWIM <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SWIM_LESSO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SPRAY_PARK <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DOG_PARK <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ART_CULTUR <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DESTINATIO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DISC_GOLF <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ GOLF <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HORSEBACK_ <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDOOR_FAC <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDOOR_F_1 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDOOR_F_2 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LIGHTED_TE <chr> NA, NA, NA, NA, NA, NA, NA, "Yes", NA, NA, NA, NA, NA, NA, …
## $ NATURE_CEN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PICNIC_REN <chr> NA, NA, NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, "Y…
## $ SENIOR_PRO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SKATE_BIKE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SPORTS_COM <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WALKING_TR <chr> NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, NA, NA, NA…
## $ CYCLING_TR <chr> NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, NA, NA, NA…
## $ RUNNING_TR <chr> NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, NA, NA, NA…
## $ VOLUNTEER_ <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ YEAR_ROUND <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ YEAR_ROU_1 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ZOO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ UPDATED <chr> NA, "8/2012", "12/3/12", NA, "12/2012", "12/3/12", NA, "12/…
## $ NOTES <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LONG_DISTR <chr> NA, "Sacramento County Regional Parks", "Sacramento County …
## $ PARK_DISTR <chr> NA, "22", "22", "17", "06", "22", NA, "06", "22", NA, "06",…
## $ ADDRESSID <int> 553142, 310123, 207322, 302976, 538217, 199945, NA, 417668,…
## $ SHAPE_Leng <dbl> 11904.8115, 3977.6496, 12451.2700, 876.5463, 1308.4914, 511…
## $ SHAPE_Area <dbl> 7130139.52, 445005.74, 1556528.08, 43736.12, 102181.60, 118…
## $ geometry <MULTIPOLYGON [US_survey_foot]> MULTIPOLYGON (((6759145 198..., M…
There is a lot of stuff behind the curtain of how R handles spatial data as simple features, but the main takeaway is that sf objects are data frames. This means you can use many of the functions we’ve learned in the past couple labs to manipulate sf objects, and this includes our best buddy the pipe %>% operator. For example, let’s do the following data wrangling tasks on ca.tracts.
We do all of this in one line of continuous code using the pipe operator %>%
ca.tracts <- ca.tracts %>%
select(-medincomeM, -fbM, -totpM) %>%
rename(medincome = medincomeE, fb = fbE, totp = totpE) %>%
mutate(pfb = fb/totp)
Notice that we’ve already used all of the functions above for nonspatial data wrangling.
Another important operation is to join attribute data to an sf object. For example, let’s say you wanted to add tract level percent race/ethnicity, which is located in the californiatractsrace.csv file we downloaded earlier. Bring this file in using our familiar friend read_csv().
ca.race <- read_csv("californiatractsrace.csv")
Remember, were dealing with data frames here, so we can use left_join(), which we covered in Lab 3, to join the non spatial data frame ca.race to the spatial data frame sac.county.tracts.
sac.county.tracts <- sac.county.tracts %>%
left_join(ca.race, by = "GEOID")
sac.county.tracts
## Simple feature collection with 363 features and 7 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 6601243 ymin: 1768598 xmax: 6840876 ymax: 2030363
## Projected CRS: NAD83 / California zone 2 (ftUS)
## First 10 features:
## GEOID NAME medhval
## 1 06067008119 Census Tract 81.19, Sacramento County, California 346100
## 2 06067006102 Census Tract 61.02, Sacramento County, California 396500
## 3 06067009651 Census Tract 96.51, Sacramento County, California 477800
## 4 06067009644 Census Tract 96.44, Sacramento County, California 455600
## 5 06067008129 Census Tract 81.29, Sacramento County, California 289600
## 6 06067009610 Census Tract 96.10, Sacramento County, California 290100
## 7 06067005101 Census Tract 51.01, Sacramento County, California 276900
## 8 06067009310 Census Tract 93.10, Sacramento County, California 416000
## 9 06067008907 Census Tract 89.07, Sacramento County, California 329500
## 10 06067009335 Census Tract 93.35, Sacramento County, California 445400
## pnhwhite pnhblk pnhasn phisp geometry
## 1 51.25090 6.030369 1.822126 34.27332 POLYGON ((6758527 2009543, ...
## 2 45.88656 7.332652 20.107307 22.86663 POLYGON ((6732697 1986050, ...
## 3 27.30239 7.650471 45.141407 14.37636 POLYGON ((6728531 1908088, ...
## 4 34.75242 9.480676 23.973430 20.77295 POLYGON ((6711842 1915513, ...
## 5 65.45064 10.443491 2.038627 18.95565 POLYGON ((6753713 2019602, ...
## 6 17.02577 16.223067 23.855795 30.13660 POLYGON ((6720102 1931004, ...
## 7 11.82903 12.763419 31.669980 26.77932 POLYGON ((6730827 1943021, ...
## 8 51.74938 6.265554 16.586151 22.61748 POLYGON ((6741750 1911390, ...
## 9 64.46149 5.528923 7.941834 16.74656 POLYGON ((6763734 1984521, ...
## 10 12.13678 7.962755 38.706052 28.06229 POLYGON ((6730432 1928719, ...
Note that you cannot join two sf objects together using left_join(). Always use left_join() to join a regular non spatial data frame to a spatial object. For example, you will get an error if you try to join the ca.tracts to sac.county.tracts
sac.county.tracts <- sac.county.tracts %>%
left_join(ca.tracts, by = "GEOID")
## Error: y should not have class sf; for spatial joins, use st_join
The error message tells you that you can use the function st_join() to join two spatial data objects. We’ll learn more about st_join() in the next lab.
You can also use the exploratory data analysis functions we learned about in Lab 4. For example, what is the correlation between median housing value and percent black?
sac.county.tracts %>%
summarize(Correlation = cor(medhval, pnhblk, use = "complete.obs"))
## Simple feature collection with 1 feature and 1 field
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 6601243 ymin: 1768598 xmax: 6840876 ymax: 2030363
## Projected CRS: NAD83 / California zone 2 (ftUS)
## Correlation geometry
## 1 -0.393557 POLYGON ((6608864 1788526, ...
We get the correlation, but it’s not quite as clean as when we calculated descriptive statistics in Lab 4. That’s because the resulting object after you do summarize() is still a spatial object. To coerce it into a non spatial object, use the function st_drop_geometry(), which unsurprisingly drops the object’s geometry. This way you can use flextable() to create presentation ready tables.
sac.county.tracts %>%
summarize(Correlation = cor(medhval, pnhblk, use = "complete.obs")) %>%
st_drop_geometry() %>%
flextable()
Correlation |
|---|
-0.393557 |
The main takeaway: sf objects are data frames, so you can use many of the functions you’ve learned in the past couple of labs on these objects.
To save an sf object to a file, use the function st_write() and specify at least two arguments, the sf object you want to save and a file name in quotes with the file extension. You’ll also need to specify delete_layer = TRUE which overwrites the existing file if it already exists in your current working directory. Make sure you’ve set your directory to the folder you want your file to be saved in. Type in getwd() to see your current directory and use setwd() to set the directory.
Let’s save sac.county.tracts as a shapefile named saccountytractslab5.shp.
st_write(sac.county.tracts, "saccountytractslab5.shp", delete_layer = TRUE)
Check your current working directory to see if the file saccountytractslab5.shp was saved.
You can save your sf object in a number of different data formats other than shp. We won’t be concerned too much with these other formats in this class, but you can see a list of them here.
Now that you’ve got your spatial data in and wrangled, the next natural step is to map something. There are several functions in R that can be used for mapping. We won’t go through all of them, but GWR outlines in Table 9.1 the range of mapping packages available in R. The package we’ll rely on in this class for mapping is tmap.
The foundation function for mapping in tmap is tm_shape(). You then build on tm_shape() by adding one or more elements, all taking on the form of tm_. Let’s make a choropleth map of median housing values.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile")

You first put the dataset sac.county.tracts inside tm_shape(). Because you are plotting polygons, you use tm_polygons() next. If you are plotting points, you will use tm_dots(). If you are plotting lines, you will use tm_lines(). The argument col = "medhval" tells R to shade the tracts by the variable medhval. tmap allows users to specify the classification style with the style argument. The argument style = "quantile" tells R to break up the shading into quantiles, or equal groups of 5. Seven of the most useful classification styles are described in the bullet points below (taken from GWR):
style = pretty, the default setting, rounds breaks into whole numbers where possible and spaces them evenlystyle = equal divides input values into bins of equal range, and is appropriate for variables with a uniform distribution (not recommended for variables with a skewed distribution as the resulting map may end-up having little color diversity)style = quantile ensures the same number of observations fall into each category (with the potential down side that bin ranges can vary widely)style = jenks identifies groups of similar values in the data and maximizes the differences between categoriesstyle = cont (and order) present a large number of colors over continuous color field, and are particularly suited for continuous rasters (order can help visualize skewed distributions)style = sd divides the values by standard deviations above and below the mean.style = cat was designed to represent categorical values and assures that each category receives a unique colorThe importance of choosing the appropriate classification scheme is discussed in Handout 5. You’ll get some practice trying out other classification schemes in this week’s assignment. Note that tmap is smart enough to detect the presence of missing values, and shades them gray and labels them on the legend.
You can overlay multiple features on one map. For example, we can add park polygons on top of county tracts, providing a visual association between parks and percent white. Here, we add another tm_shape() and tm_polygons() to the above code.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile") +
tm_shape(parks) +
tm_polygons(col = "green")

Don’t like the yellow/brown color scheme? We can change the color scheme using the argument palette = within tm_polygons(). The argument palette = defines the color ranges associated with the bins as determined by the style argument. Below we use the color scheme “Reds” using style = "quantile".
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile", palette = "Reds")

See Ch. 9.2.4 in GWR for a fuller discussion on color and other schemes you can specify.
In addition to the built-in palettes, customized color ranges can be created by specifying a vector with the desired colors as anchors. This will create a spectrum of colors in the map that range between the colors specified in the vector. For instance, if we used c(“red”, “blue”), the color spectrum would move from red to purple, then to blue, with in between shades. In our example:
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = c("red","blue"))

Not exactly a pretty picture. In order to capture a diverging scale, we insert “white” in between red and blue.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = c("red","white", "blue"))

A preferred approach to select a color palette is to chose one of the schemes contained in the RColorBrewer package. These are based on the research of cartographer Cynthia Brewer (see the colorbrewer2 web site for details). RColorBrewer makes a distinction between sequential scales (for a scale that goes from low to high), diverging scales (to highlight how values differ from a central tendency), and qualitative scales (for categorical variables). For each scale, a series of single hue and multi-hue scales are suggested. In the RColorBrewer package, these are referred to by a name (e.g., the “Reds” palette we used above is an example). The full list is contained in the RColorBrewer documentation.
There are two very useful commands in this package. One sets a color palette by specifying its name and the number of desired categories. The result is a character vector with the hex codes of the corresponding colors.
For example, we select a sequential color scheme going from blue to green, as BuGn, by means of the command brewer.pal, with the number of categories (6) and the scheme as arguments. The resulting vector contains the HEX codes for the colors.
pal <- brewer.pal(6,"BuGn")
pal
## [1] "#EDF8FB" "#CCECE6" "#99D8C9" "#66C2A4" "#2CA25F" "#006D2C"
Using this palette in our map yields the following result.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile", palette="BuGn")

The command display.brewer.pal() allows us to explore different color schemes before applying them to a map. For example:
display.brewer.pal(6,"BuGn")

There are many options to change the formatting of the legend. Often, the automatic title for the legend is not intuitive, since it is simply the variable name (in our case, medhval). This can be customized by setting the title argument in tm_polygons(). Let’s change the legend title to “Housing values”
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values")

Another important aspect of the legend is its positioning. This is handled through the tm_layout() function. This function has a vast number of options, as detailed in the documentation. Also check the help documentation for tm_layout() to see the complete list of settings and examples in Ch. 9.2.5 in GWR. There are also specialized subsets of layout functions, focused on specific aspects of the map, such as tm_legend(), tm_style() and tm_format(). We illustrate the positioning of the legend.
The default is to position the legend inside the map. Often, this default solution is appropriate, but sometimes further control is needed. The legend.position argument in the tm_layout() function moves the legend around the map, and it takes on a vector of two string variables that determine both the horizontal position (“left”, “right”, or “center”) and the vertical position (“top”, “bottom”, or “center”). The default is “right” and “bottom”. But, we can change it to, say, top right.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(legend.position = c("right", "top"))

Yuck. We can leave it at the bottom right. Or there is also the option to position the legend outside the frame of the map. This is accomplished by setting legend.outside to TRUE (the default is FALSE) in tm_layout().
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(legend.outside = TRUE)

We can also customize the size of the legend, its alignment, font, etc. We refer to GWR for specifics.
Another functionality of the tm_layout() function is to set a title for the map, and specify its position, size, etc. For example, we can set the title using main.title, and the size using main.title.size as in the example below. We made the font size a bit smaller (0.95) in order not to overwhelm the map.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2017-21 Median Housing Values in Sacramento County",
main.title.size = 0.95, legend.outside = TRUE)

You can change the title position using main.title.position. For example, we center the title by specifying “center”.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2015-19 Median Housing Values in Sacramento County",
main.title.size = 0.95, main.title.position="center",
legend.outside = TRUE)

We need to add the other key map elements described in Handout 5. Here is where we start adding more layout functions after tm_polygons() using the + operator. First, the scale bar, which you can add using the function tm_scale_bar()
tm_shape(sac.county.tracts, unit = "mi") +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2015-19 Median Housing Values in Sacramento County",
main.title.size = 0.95, main.title.position="center",
legend.outside = TRUE) +
tm_scale_bar(breaks = c(0, 5, 10), text.size = 0.75,
position = c("right", "bottom"))

The argument breaks within tm_scale_bar() tells R the distances to break up and end the bar. Make sure you use reasonable break points - the Sacramento county area is not, for example, 200 miles wide, so you should not use c(0,100,200) (try it and see what happens. You won’t like it). Note that the scale is in miles (were in America!). The default is in kilometers (the rest of the world!), but you can specify the units within tm_shape() using the argument unit. Here, we used unit = "mi" to designate distance in the scale bar measured in miles. The position = argument locates the scale bar on the bottom right of the map. The argument text.size = controls the size of the scale bar. We decrease the size by 25%.
The next element is the north arrow, which we can add using the function tm_compass(). You can control for the type, size and location of the arrow within this function. We place a 4-star arrow on the top left of the map.
tm_shape(sac.county.tracts, unit = "mi") +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2015-19 Median Housing Values in Sacramento County",
main.title.size = 0.95, main.title.position="center",
legend.outside = TRUE) +
tm_scale_bar(breaks = c(0, 5, 10), text.size = 0.75,
position = c("right", "bottom")) +
tm_compass(type = "4star", position = c("left", "top"))

We can make the map prettier by changing a variety of settings. We can eliminate the frame around the map using the argument frame = FALSE with tm_layout. We also add back the parks.
sac.map <- tm_shape(sac.county.tracts, unit = "mi") +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2015-19 Median Housing Values in Sacramento County",
main.title.size = 0.95, main.title.position="center",
legend.outside = TRUE, frame = FALSE, ) +
tm_scale_bar(breaks = c(0, 5, 10), text.size = 0.75,
position = c("right", "bottom")) +
tm_compass(type = "4star", position = c("left", "top")) +
tm_shape(parks) +
tm_polygons(col = "green")
sac.map

Notice that we stored the map into an object called sac.map. R is an object-oriented language, so everything you make in R are objects that can be stored for future manipulation. This includes maps. You should see sac.map in your Environment window. By storing the map, you can access it anytime during your current R session.
Check the full list of tm_ elements here.
You can save your maps a couple of ways.
tmap_save()For option 2, we can save the map object sac.map as such
tmap_save(sac.map, "saccountyhval.png")
Specify the tmap object and a filename with an extension. It supports pdf, eps, svg, wmf, png, jpg, bmp and tiff. The default is png. Also make sure you’ve set your directory to the folder that you want your map to be saved in.
So far we’ve created static maps. That is, maps that don’t “move”. But, we’re all likely used to Google or Bing maps - maps that we can move around and zoom into. You can make interactive maps in R using the package tmap.
To make your tmap object interactive, use the function tmap_mode(). Type in “view” inside this function.
tmap_mode("view")
Now that the interactive mode has been ‘turned on’, all maps produced with tm_shape() will launch. Let’s view our saved sac.map interactively.
sac.map
Click on
above the map and a larger window should open up.
Besides interactivity, another important benefit of tmap_mode() is that it provides a basemap, which was discussed in Handout 5. The function of a basemap is to provide background detail necessary to orient the location of the map. In the static maps we produced earlier, Sacramento county was sort of floating in white space. As you can see in the interactive map above we’ve added geographic context to the surrounding area.
The default basemap in tmap_mode() is CartoDB.Positron. You can change the basemap through the tm_basemap() function. For example, let’s change the basemap to an OpenStreetMap.
sac.map + tm_basemap("OpenStreetMap")